The MSSP NCNM Presentation - Professor: Haviland Wright
Group 1: Jimmy Ye, Jinyu Li, Yuli Jin
Group 2: Daniel Xu, Kayla Choi, Nancy Shen
Group 3: Mi Zhang, Boyu Chen, Shicong Wang, Biyao Zhang
Group 4: Keliang Xu, Yingjie Wang, James He, Ruining Jia
Alison Turner: a Community Development Planner at NCNMEDD and recent MSSP graduate
Aidan O’Hara: working with Alison since late July
Allen Razdow:founder and president of True Engineering Technology, LLC and originator of Truenumbers
Historically, few resources to acquire grants
Trouble successfully administering grants to complete projects
Currently, at a turning point:
New pandemic-related dollars flowing to the region; have capital to spend on new projects
Two big issues of broadband access and issues of outmigration
Census; they don’t collect a lot of data from their office
They would like recommendations on the gaps in census data or the insufficiencies that they’re seeing by the census as a region.
Demographics(categorical).
Income(numerical), range: 0-1,000,000,000,000 (unsure if this is the maximum) gross receipts tased.
Unemployment rate(numerical).
GDP(numerical).
Number of business establishments(numerical).
Ultimate Goal:
TrueNumbers database that can be accessed by NCNMEDD and local government staff to assist with grant applications.
An analysis of the data from the region - we have fairly low census response rates which could lead to data quality issues
if data quality issues exist, come up with supplemental sources of data to improve inferences made about the region.
Focusing on for this semester:
TrueNumbers
Dive into what the census is, why it’s important, and how low response rates may pose an issue.
Streamline the data acquisition, organization, and analysis process.
Using Tnum package, created function to extract county-level census data.
Visualization using ggplot to check the relationship between variables.
Create some models to have an in-depth insight of the grant situation of New Mexico
Our data is from ACS(American Community Survey).
The ACS is a large demographic survey collected throughout the year using mailed questionnaires, telephone interviews, and visits from Census Bureau field representatives to about 3.5 million household addresses annually.
Data availability for geographic areas differs by population size:1-year estimates are available for areas of population 65,000 or more, while 5-year estimates are available for all areas.
Estimates are produced for - demographic characteristics (sex,age,);
- social characteristics (school enrollment, educational attainment);
- economic characteristics (employment status, commuting to work);
- housing characteristics (housing occupancy, units in structure).
In this presentation, we basically focused on
Modify Dataset: - clean_tag: Modify the columns of subjects and tags to more simple columns
- get_county: Modify the column of county to more simple column
Function for tackling the data for further analysis: - get_county_data: This function is designed to get the data from different county
You still need some steps to get the data: Connect BU VPN: vpn.bu.edu
Run the following code and wait for seconds:
- source(file=‘data_clean/mexico_screen_function.R’)
- data<-get_county_data()
- data %>% view()